Skip to content

Refactor: refine tensor dependency tracking#415

Merged
poursoul merged 1 commit intohw-native-sys:mainfrom
jvjhfhg:fix/orch-semantics
Apr 1, 2026
Merged

Refactor: refine tensor dependency tracking#415
poursoul merged 1 commit intohw-native-sys:mainfrom
jvjhfhg:fix/orch-semantics

Conversation

@jvjhfhg
Copy link
Copy Markdown
Collaborator

@jvjhfhg jvjhfhg commented Mar 31, 2026

  • Add OUTPUT_EXISTING and NO_DEP handling for existing tensors, and split creator retention from overlap-based writer lookup.
  • Store owner_task_id in Tensor, remove the CreatorMap path, and update affected orchestration examples to use the refined dependency semantics.

@jvjhfhg jvjhfhg force-pushed the fix/orch-semantics branch from c4a6952 to 19d6c42 Compare March 31, 2026 02:55
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the dependency tracking system by introducing creator-based tracking via a new owner_task_id field in the Tensor structure, complementing the existing OverlapMap lookups. It adds new TensorArgType variants, OUTPUT_EXISTING and NO_DEP, to better handle different buffer lifecycles and reduces complexity in the TensorMap by removing the with_alloc flag. Feedback focuses on critical safety issues where the fanin_count limit is silently enforced, which could lead to dropped dependencies and data races. Additionally, it is recommended that OUTPUT_EXISTING perform full OverlapMap lookups to maintain correctness and prevent stale entries in the TensorMap.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated
Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated
Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated
Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated
@jvjhfhg jvjhfhg force-pushed the fix/orch-semantics branch from 19d6c42 to 1dbc46e Compare March 31, 2026 02:59
Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated
Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_task_id.h Outdated
@jvjhfhg jvjhfhg force-pushed the fix/orch-semantics branch 14 times, most recently from a37a79f to f11a30c Compare March 31, 2026 10:51
@jvjhfhg jvjhfhg marked this pull request as draft March 31, 2026 11:49
@jvjhfhg jvjhfhg force-pushed the fix/orch-semantics branch from f11a30c to 61a5e0a Compare March 31, 2026 12:19
@jvjhfhg jvjhfhg marked this pull request as ready for review March 31, 2026 12:22
poursoul
poursoul previously approved these changes Apr 1, 2026
@jvjhfhg jvjhfhg marked this pull request as draft April 1, 2026 01:23
@jvjhfhg jvjhfhg force-pushed the fix/orch-semantics branch from 61a5e0a to b9220d3 Compare April 1, 2026 01:59
@jvjhfhg jvjhfhg marked this pull request as ready for review April 1, 2026 01:59
@jvjhfhg jvjhfhg force-pushed the fix/orch-semantics branch from b9220d3 to 2006f05 Compare April 1, 2026 02:00
Add OUTPUT_EXISTING and NO_DEP handling for existing tensors, and split creator retention from overlap-based writer lookup.

Store owner_task_id in Tensor, remove the CreatorMap path, and update affected orchestration examples to use the refined dependency semantics.
@jvjhfhg jvjhfhg force-pushed the fix/orch-semantics branch from 2006f05 to a05f801 Compare April 1, 2026 02:02
@poursoul poursoul merged commit 4917d12 into hw-native-sys:main Apr 1, 2026
16 checks passed
@jvjhfhg jvjhfhg deleted the fix/orch-semantics branch April 1, 2026 02:09
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 1, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a (hw-native-sys#419): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 (hw-native-sys#403): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a (hw-native-sys#419): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff (hw-native-sys#389): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 (hw-native-sys#390): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 (hw-native-sys#395): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 (hw-native-sys#387): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 (hw-native-sys#403): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 (hw-native-sys#404): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 (hw-native-sys#415): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c (hw-native-sys#417): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 (hw-native-sys#392): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 (hw-native-sys#387): Use from_u64<float> in softmax_prepare kernels
- fe63325 (hw-native-sys#403): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 (hw-native-sys#404): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c (hw-native-sys#417): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 1, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 1, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 1, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 1, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 2, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([[[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([[[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([[[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([[[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([[[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 2, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([[[[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([[[[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([[[[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([[[[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([[[[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Apr 2, 2026
…ndency tracking

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([[[[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([[[[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([[[[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([[[[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([[[[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoWao pushed a commit that referenced this pull request Apr 2, 2026
…ndency tracking (#426)

Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern.

Platform (src/a5/platform/):
- 012675a ([[[[[[#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements
- fe63325 ([[[[[[#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 012675a ([[[[[[#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- 7059fff ([[[[[[#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator
- 27a85c8 ([[[[[[#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter
- 1d97ac5 ([[[[[[#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing
- 121a1d5 ([[[[[[#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API
- fe63325 ([[[[[[#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry
- cd59b47 ([[[[[[#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext
- 4917d12 ([[[[[[#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking
- 34a6e1c ([[[[[[#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions

Tests (examples/a5/, tests/st/a5/):
- be765f1 ([[[[[[#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API
- 121a1d5 ([[[[[[#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels
- fe63325 ([[[[[[#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true)
- cd59b47 ([[[[[[#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test)
- 34a6e1c ([[[[[[#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples
- Remove redundant end-of-kernel sync barriers in paged_attention test kernels
- Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants